Feature Selection Method Based on Improved Document Frequency

نویسندگان

  • Wei Zheng
  • Guohe Feng
چکیده

Feature selection is an important part of the process of text classification, there is a direct impact on the quality of feature selection because of the evaluation function. Document frequency (DF) is one of several commonly methods used feature selection, its shortcomings is the lack of theoretical basis on function construction, itwill tend to select high-frequency words in selecting. To solve the problem, we put forward a improved algorithm named DFMcombined withclass distribution of characteristics and realize the algorithm with programming, DFM were compared with some feature selection method commonly used with experimental using support vector machine, as text classification .The results show that, when feature selection, the DFM methods performance is stable at work andis better than other methodsin classification results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Improved Information Gain Algorithm Based on Relative Document Frequency Distribution

Feature selection algorithm plays an important role in text categorization. Considering some drawbacks proposed from traditional and recently improved information gain(IG) approach, an improved IG feature selection method based on relative document frequency distribution is proposed, which combines reducing the impact of unbalanced data sets and low-frequency characteristics, the frequency dist...

متن کامل

A Real-Time Electroencephalography Classification in Emotion Assessment Based on Synthetic Statistical-Frequency Feature Extraction and Feature Selection

Purpose: To assess three main emotions (happy, sad and calm) by various classifiers, using appropriate feature extraction and feature selection. Materials and Methods: In this study a combination of Power Spectral Density and a series of statistical features are proposed as statistical-frequency features. Next, a feature selection method from pattern recognition (PR) Tools is presented to e...

متن کامل

A Novel Architecture for Detecting Phishing Webpages using Cost-based Feature Selection

Phishing is one of the luring techniques used to exploit personal information. A phishing webpage detection system (PWDS) extracts features to determine whether it is a phishing webpage or not. Selecting appropriate features improves the performance of PWDS. Performance criteria are detection accuracy and system response time. The major time consumed by PWDS arises from feature extraction that ...

متن کامل

Feature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets

Objective(s): This study addresses feature selection for breast cancer diagnosis. The present process uses a wrapper approach using GA-based on feature selection and PS-classifier. The results of experiment show that the proposed model is comparable to the other models on Wisconsin breast cancer datasets. Materials and Methods: To evaluate effectiveness of proposed feature selection method, we ...

متن کامل

Filtering Methods for Feature Selection in Web-Document Clustering

This paper presents the results of a comparative study of filtering methods for feature selection in web document clustering. First, we focused on feature selection methods based on Mutual Information (MI) and Information Gain (IG). With those features and feature values, and using MI and IG, we extracted from documents representative max-value features as well as a representative cluster for a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015